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ON THE ASYMPTOTIC EFFICIENCY OF ABC ESTIMATORS 


By Wentao Li and Paul Fearnhead 
Lancaster University 

Many statistical applications involve models for which it is diffi¬ 
cult to evaluate the likelihood, but relatively easy to sample from. Ap¬ 
proximate Bayesian computation (ABC) is a likelihood-free method 
for implementing Bayesian inference in such cases. It avoids evaluat¬ 
ing the likelihood function and generates samples from an approxi¬ 
mate posterior distribution by jointly simulating the parameter and 
the data, and accepting parameter values that give simulated data 
close to the observed data. We present results on the asymptotic 
variance of ABC estimators in a large-data limit. Our key assump¬ 
tion is that we summarise the data by a fixed dimensional summary 
statistic and that this summary statistic obeys a central limit the¬ 
orem. We prove asymptotic normality of the ABC posterior mean. 
This improves on recent results on consistency for the ABC posterior 
mean, and in particular specifies its rate of convergence. This result 
also shows that, in terms of asymptotic variance, we should use a 
summary statistic that is the same dimension as the parameter vec¬ 
tor, p; and that any summary statistic of higher dimension can be 
reduced, through a linear transformation, to dimension p in a way 
that can only reduce the asymptotic variance of the ABC posterior 
mean. We then look at how the Monte Carlo error of an importance 
sampling algorithm that samples from the ABC posterior effects the 
accuracy of the ABC estimator. We give conditions on the impor¬ 
tance sampling proposal distribution such that the variance of the 
ABC estimator will be the same order as that of the MLE based on 
the summary statistics used by ABC. This result suggests an iterative 
importance sampling algorithm, which we then evaluate empirically 
on a stochastic volatility model. 


1. Introduction. There are many statistical applications which involve inference about 
models that are easy to simulate from, but for which it is difficult, or impossible, to calcu¬ 
late likelihoods. In such situations it is possible to use the fact we can simulate from the 
model to enable us to perform inference. There is a wide class of such likelihood-free meth¬ 
ods of inference including indirect inference [19, 20], the bootstrap filter [18], simulated 
methods of moment [16], and synthetic likelihood [36]. 

We consider a Bayesian version of these methods, termed Approximate Bayesian Com¬ 
putation (ABC). This approach involves defining an approximation to the posterior distri¬ 
bution in such a way that it is possible to sample from this approximate posterior using 
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Algorithm 1: Importance and Rejection Sampling ABC 

1. Simulate 0i, •••, On ~ Qn{0)] 

2. For each i = 1,..., AT, simulate = (y^\ • • • ,Vn^) ~ /n(2/|0i); 

3. For each i = 1,..., AT, accept 0* with probability K £ (sn^ — Sobs)> where si ’ 1 =s„(y«); 
and define the associated weight as wi = n{0i)/q n {0i). 


only the ability to sample from the model. Arguably the first ABC method was that of 
[29], and these methods have been popular within population genetics [5, 11, 34], ecology 
[3] and systematic biology [33, 30]. More recently, there have been applications of ABC to 
other areas including stereology [9], stochastic differential equations [27], finance [26] and 
cosmology [22], 

Let K(x) be a density kernel, where va&yL x K(x) = 1, and e > 0 be a bandwidth. 
Denote the data as Y 0 b s = (y 0 b s , i , ■ ■ ■ ,y 0 bs,n )• Assume we have chosen a finite dimensional 
summary statistic s n (Y ), and denote s Q b s = s n (Y 0 b s ). If we model the data as a draw from 
a parametric density, f n (y | 0 ), and assume prior, then we define the ABC posterior as 

(1) ^ABc(0\s obs ,e) OC 7 T(0) J fnipobs + £v\0)K(v) dv , 

where f n (s\0) is the density for the summary statistic implied by f n (y\0). Let fABc(s 0 bs\Q, £) = 
f fn{s 0 bs + ev\9)K(v) dv. This framework encompasses most implementations of ABC. In 
particular, the use of the uniform kernel corresponds to the popular rejection-based rule 
for ABC algorithm [5]. 

The idea is that fABc{s o b s \0, £) is an approximation of the likelihood. The ABC pos¬ 
terior, which is proportional to the prior multiplied by this likelihood approximation, is 
an approximation of the true posterior. The likelihood approximation can be interpreted 
as a measure of, on average, how close the summary, s n , simulated from the model is to 
the summary for the observed data, s 0 b s - The choices of kernel and bandwidth affect the 
definition of “closeness”. 

By defining the approximate posterior in this way, we can simulate samples from it 
using standard Monte Carlo methods. One approach, that we will focus on later, uses 
importance sampling. Let K e {x) = K(x/e). Given a proposal density, q n {0), a bandwidth, 
e, and a Monte Carlo sample size, N, the importance sampling ABC (IS-ABC) would 
proceed as in Algorithm 1. The set of accepted parameters and their associated weights 
provides a Monte Carlo approximation to ttabc■ Note that if we set q n {9 ) = ir(0) then this 
is just a rejection sampler with the ABC posterior as its target, which is called rejection 
ABC in this paper. In practice sequential importance sampling methods are often used to 
learn a good proposal distribution [4], 

There are three choices in implementing ABC: the choice of summary statistic, the choice 
of bandwidth, and the specifics of the Monte Carlo algorithm. For importance sampling, 
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the last of these involves specifying the Monte Carlo sample size, N, and the proposal 
density, q n (6). These, roughly, relate to three sources of approximation in ABC. To see 
this note that as e —> 0 we would expect ABC posterior to converge to the posterior 
given s 0 bs [17]. Thus the choice of summary statistic governs the approximation, or loss 
of information, between using the full posterior distribution and using the posterior given 
the summary. The value e then affects how close the ABC posterior is to the posterior 
given the summary. Finally there is then Monte Carlo error from approximating the true 
ABC posterior with a Monte Carlo sample. The Monte Carlo error is not only affected by 
the specifics of the Monte Carlo algorithm, but also by the choices of summary statistic 
and bandwidth, which together affect, say, the probability of acceptance in step 3 of the 
above importance sampling algorithm. Having a higher dimensional summary statistic, or 
a smaller value of e, will tend to reduce this acceptance probability and hence increase the 
Monte Carlo error. 

This work aims to study the interaction between all three sources of error, in the case 
where the summary statistics obey a central limit theorem (CLT) for large n. We are 
particularly interested in the efficiency of ABC, where by efficiency we mean that ABC has 
the same rate of convergence as the MLE for the parameter given the summary statistic. In 
particular this work is motivated by the question of whether ABC can be efficient as n —> oo 
if we have a fixed Monte Carlo sample size. Intuitively this appears unlikely. For efficiency 
we will need e —> 0 as n —> oo, and this corresponds to an increasingly strict condition for 
acceptance. Thus we may imagine that the acceptance probability will necessarily tend to 0 
as n increases, and thus we will need an increasing Monte Carlo sample size to compensate 
for this. 

However our results show that IS-ABC can be efficient if we choose an appropriate 
proposal distribution. The proposal distribution needs to have a suitable scale and location 
and have appropriately heavy tails. This can be achieved through an iterative procedure 
that learns the location and scale of the ABC posterior, and uses these as the basis of 
location and scale parameters for, say, a t-distributed proposal distribution. If we use an 
appropriate proposal distribution and have a summary statistic of the same dimension as 
the parameter vector we obtain that the ABC posterior mean is asymptotically unbiased 
with a variance that is 1 + 0(1/N) times that of the MLE based on the summary. This 
is similar to asymptotic results for indirect inference [19, 20], an alternative likelihood-free 
method. Our results also lend theoretical support to methods that choose the bandwidth 
indirectly through specifying the proportion of samples that are accepted. This approach 
leads to a bandwidth which is of the optimal order in n. 

To obtain this result we first prove a Bernstein-von Mises type theorem for the ABC 
posterior mean. This is a non-standard convergence result as it is based on the partial 
information contained in the summary statistics. For related convergence results see [10] 
and [37]. However, this earlier work does not consider the case when the dimension of 
the summary statistic is larger than that of the parameter, which is commonplace in real- 
life applications of ABC. Dealing with a summary statistic of higher dimension than the 
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parameter vector introduces extra technical challenges. This is because previous proofs, 
based on the density of the summary be generalised as we now require densities on the 
lower dimensional manifold that is generated by the projecting from the summary to the 
parameter [37]. 

The convergence result we obtain for the ABC posterior mean has two practically im¬ 
portant consequences. The first is that it shows that any d dimensional summary with 
d > p can be reduced to a p dimensional summary statistic without any loss of informa¬ 
tion: in that the posterior mean based on the reduced summary has the same asymptotic 
distribution as that based on the original summary. Furthermore it shows that using ABC 
with a summary statistic of dimension d > p can lead to an increased bias. This in turn 
means that the asymptotic variance of the ABC posterior mean can be larger than if the 
reduced summary was used. This advantage of using a summary of dimension p comple¬ 
ments previous arguments for such a choice [17], which were based around reducing Monte 
Carlo variance. 

This paper adds to a growing literature looking at the theoretical properties of ABC. 
Initial results focussed on the bias of ABC, defined as Habc — E[/i(0)|Y 0 b s ] where Habc 
is the ABC posterior mean, and the Monte Carlo variance of estimating Iiabc- The bias 
converges to E[/r(0)|s o ;, s ] — E[/i(0)|Y o b s ] as the bandwidth decreases to 0, hence the ABC 
estimator is consistent if s 0 b s is sufficient [1, 17, 14]. The convergence rate of the bias is 
found to be as small as 0(e 2 ) in various settings[13, 1]. These results can then be used 
to consider how the choice of e should depend on the Monte Carlo sample size so as to 
trade-off ABC bias and Monte Carlo variability [7, 1, 6]. They have also been used to give 
conditions for the ABC bias to be negligible when compared to the asymptotic variance 
of posterior mean, and to guide the selection of the bandwidth. For example, when the 
observations are i.i.d insights from [35] suggest that an upper bound for the bandwidth 
is o(n^ 1 / 2 ); whilst for observations from a hidden Markov model, [13] relaxed the upper 
bound to be o(n -1 / 4 ) in the case where the full dataset is used as the summary statistic. 
[13] further shows that the ABC posterior distribution can be arbitrarily close to the true 
posterior distribution as the bandwidth goes to 0 if the full-data is used as a summary 
statistic. 

There has also been work looking at the consistency of ABC as we obtain more data. 
[23] consider consistency in ABC model choice and [24] consider consistency of ABC for 
parameter estimation. Both results shows consistency under weaker assumptions than we 
make. However our Theorem 3.1 gives a rate of convergence for ABC, shows how this 
depends on the choice of bandwidth, and shows how the asymptotic variance depends on 
the summary statistics. Also here it is natural to focus on the assumption similar to the 
classical Bernstein -von Mises theorem, since our purpose is to compare the efficiency with 
the likelihood-based estimators. 

Finally, a number of papers have looked at the choice of summary statistics [e.g. 34, 
25, 17, 2, 7, 28]. Whilst this is not the focus of our paper, Theorem 3.1 does give insight 
into this choice. As mentioned above, this result shows that, in terms of minimising the 
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asymptotic variance, we should use a summary that is of the same dimension as the number 
of parameters. In particular it further supports the suggestion in [17] of having one summary 
per parameter, with that summary approximating the MLE for that parameter (viewed 
as a function of the data). If we were to use the true MLEs as the summaries, then it 
follows from Theorem 3.1 that asymptotically the ABC posterior mean would attain the 
Cramer-Rao lower bound. 

The paper is organised as follows. Section 2 sets up some notations and presents the 
key assumptions for the main theorems. Section 3 gives the asymptotic normality of the 
ABC posterior mean of h(6 ) for n -A oo. Section 4.1 gives the asymptotic normality of h 
when N —> oo. In Section 4.2, the relative asymptotic efficiency between the MLE based 
on the summary statistic and h is studied for various proposal densities. An iterative 
importance sampling algorithm is proposed and the comparison between ABC and the 
indirect inference (II) is given. In Section 5 we demonstrate our results empirically on an 
analytically tractable normal example and a stochastic volatility model, paper concludes 
with some discussions. Proofs are contained in the Appendices. 

2. Notation and Set-up. Denote the data by Y 0 b s = (y 0 bs.u ''' > Uobs,n)i where n 
is the sample size, and each observation, y 0 b s ,i, can be of arbitrary dimension. We will be 
considering the asymptotics as n —> oo, and thus denote the density of Y 0 b s by f n (y\0)- 
This density depends on an unknown parameter 6. We will let 6 o denote the true parameter 
value, and ir(6) the prior distribution for the parameter. Let p be the dimension of 0 and 
IP be the parameter space. For a set A, let A c be its complement with respect to the whole 
space. 

We assume that 6q is in the interior of the parameter space: 

(Cl) There exists some do > 0, such that Vq = {6 : \6 — 0q\ < do} C T. 

To implement ABC we will use a summary statistic of the data, s n (V) 6 M d ; for example 
a vector of sample means of appropriately chosen functions of the data. This summary 
statistic will be of fixed dimension, d , as we vary n. The density for s n (y), implied by 
the density for the data, will depend on n, and we denote this by f n (s\6). We will use the 
shorthand S n to denote the random variable with density f n (s\0). In ABC we use a kernel, 
K(x), with maxj, K(x ) = 1, and a bandwidth e > 0. As we vary n we will often wish to 
vary e, and in these situations denote the bandwidth by e n . For the importance sampling 
algorithm we require a proposal distribution, q n (0), and allow for this to depend on n. We 
assume the following conditions on the kernel: 

(C2) (i) f vK(v) dv = 0; 

(ii) j nU v ikK{ v ) dv < oo for any coordinates (ty,, • ■ ■ ,u^) of v and l < p + 6; 

(iii) K{v ) = Ji(||u||^) where ||u||^ = v T Av and A is a diagonal matrix, and K(v ) is 
a decreasing function of ||u||a; 

(iv) K(v) = 0(e~ Cl II^H 01 ) for some a\ > 0 and c\ > 0 as ||w|| -A oo, 
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which are satisfied by all commonly used kernels in ABC. 

For a real function g{x) denote its k t h partial derivative at x = Xo by D Xk g(x o), the 
gradient function by D x g(x o) and the Hessian matrix by H x g(xo). To simplify the notations, 
Dg k , Dq and Hq are written as D &, D and H respectively. For a series x n , besides the limit 
notations 0(-) and o(-), we use the notations that for large enough n, x n = @(a n ) if there 
exists constants m and M such that 0 < m < \x n /a n \ < M < oo, and x n = H(a n ) if 
\x n jCL n \ -A oo. For two square matrices A and B, we say A < B if B — A is semi-positive 
definite. 

The asymptotic results are based around assuming a CLT for the summary statistic. 

(C3) There exists a sequence a n , with a n —> oo as n —> oo, a d-dimensional vector s(0) 
and a d x d matrix A(0 ), such that for all 0 G T, 

a n (S n — s(0)) ^a iV(0, A(0)); as n —> oo. 

Furthermore, that 

(i) s(0) and A(0) € C' 1 (fPo), and A(0) is positive definite for any 0 ; 

(ii) s(0) = s(0 o) if and only if 0 = $o; and 

(iii) 1(0) = Ds(0) t A~ 1 (0)Ds(0) has full rank at 0 = fo¬ 
under condition (C3) we have that a n is the rate of convergence in the central limit theorem. 
If the data are independent and identically distributed, and the summaries consist of sample 
means of functions of the data, then a n = n 1//2 . Part (ii) of this condition is required for the 
true parameter to be identifiable given only the summary of data. Furthermore, I^ 1 (0 o)/o 2 
is the asymptotic variance of the MLE based on the summary (henceforth MLES) for 0 
and therefore is required to be valid at the true parameter. 

We next require a condition that controls the difference between f n (s\0) and its lim¬ 
iting distribution for 0 G To- Let N(x‘H, £) be the normal density at x with mean fi 
and variance E. Define f n (s\0) = N(s] s(0), A(0)/af l ) and the standardization W n (s) = 
a n A(0)~ l / 2 (s — s(0)). Let f\y n (w\0) and/ry n (w;|0) be the density oiW n (s) when s ~ f n (s\0) 
and f n (s\0) respectively. The condition requires that the difference between fw n ( w \0) an d 
its Edgeworth expansion fw n (w\0) is o(a n 7 ), which is weaker than the standard require¬ 
ment, o(o“ 1 ), of the remainder from edgeworth expansion, and can be bounded by a density 
with exponentially decreasing tails. Specifically, assume that 

(C4) there exists a n satisfying a n /a%/ 5 -A oo and a density r max (w ) satisfying the same 
conditions as (C2) (ii)-(iv), such that sup 0e3 > o a n \f Wn (w\0) - f Wn (w\0)\ < c 3 r max (w ) 
for some positive constant c 3 . 

For 0 outside CPo, the following condition requires the tails of f n (s\0) are also exponentially 
decreasing. 

(C5) sup 0g yc fw n (w\9) = 0(e C2 IHI a2 ) as ||w|| -A oo for some positive constants C 2 and 
« 2 , and A(0) is upper bounded in 7. 
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The differentiability of the prior density around the true parameter is required. 

(C6) 7 r(0) € C 1 (!?o) an d vr(0o) > 0. 

Finally, the function of interest, h(0), needs to satisfy some differentiable and moment con¬ 
ditions in order that the remainders of its posterior moment expansion are small. Consider 
the k t h coordinate h k (0) of h(9). 

(C7) h k (0) G C\y o) and D k h(0 0 ) ± 0. 

(C8) f \h k (0)\ir(0) d6 < oo and f h k (0) 2 ir(0) d6 < oo. 

3. Asymptotics of h abc- We first ignore the Monte Carlo error of ABC, and focus 
on the ideal ABC estimator, h-ABC, where h-ABC = E7r ABC [/i(0)|s o fe s , e n ]- As an approxima¬ 
tion to the true posterior mean, E[/i(0)|Y o b s ], Habc contains the errors from the choice of 
the bandwidth, e n , and the summary statistic s 0 b s . 

To understand the effect of these two sources of error, we derive results for the asymptotic 
distributions of 1iabc and the likelihood-based estimators, including the MLE based on the 
summary (MLES) and the summary-based posterior mean, where we consider randomness 
solely due to the randomness of the data. 

Theorem 3.1. Assume conditions (C1)-(C8). 

(i) Let 0 MLES = argmaxQ^y log f n {s o bs\0) be the MLES of the parameter. For h s = 
KOmles) or E[h(0)\s o b s ], as n oo, it holds that 

a n (h s -h(0 0 )) 4 N(0, Dh(0o) T I~ 1 (0o)Dh(Oo)). 

_q /c 

(ii) If £ n = o(a n ), as n -» oo, there exists a positive definite matrix Iabc{® o) such 
that 

a n (h A BC - h(0 0 )) 4 N(O,Dh(9 o ) T I^ BC (0 o )Dh(9 o )). 

If £ n = o^n 1 ) or d = p or A{0 o) is diagonal, Iabc{@o) = 1(0 o)- For other cases, 
Iabc(Oo) < 1(0 o)- 

Theorem 3.1 (i) illustrates the validity of posterior inference based on the summary 
statistics. Regardless of the sufficiency and dimension of s 0 b s , the posterior mean based on 
the summary statistics is consistent and asymptotically normal with the same variance as 
that of the MLES. This is similar to the equivalence of the posterior mean and MLE based 
on the full dataset implied by the classical Bernstein-von Mises theorem. 

Theorem 3.1 (ii) indicates three types of bandwidth choice which determine three cases 
for the approximation accuracy of the ABC posterior mean. Denote the ABC bias, Labc ~ 
E[/i(0)|s o & s ], by bias abc- The first case is ‘negligible’ e n , which is when e n is o(l/a n ) and for 
which bias abc is negligible. This conforms to the result implied by [35] that if e n = o(l/a n ), 
the wrong model likelihood adopted by ABC is the same as the true likelihood to the first 
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order. The second case is ‘dominating’ e n which is when e n is 0(1 / ^/aff) or 11(1 /^fan). 
Although not formally stated here, it is expected that bias abc is dominating in this case, 
making the convergence rate of Iiabc slower than a n . The third case is ‘well-behaved’ 
e n which is between the previous two cases and for which the convergence rate is a n . 
Furthermore, bias^RC is still negligible if the dimension of the summary statistics is equal 
to that of the parameter. However if the dimension is larger Iiabc can be less efficient than 
MLES. the ‘negligible’ e n is preferred, as we see below the Monte Carlo acceptance rate 
will inevitably degenerate as n — > oo, and the required Monte Carlo size would need to 
increase with n. 

d > p, Theorem 3.1 (ii) shows that bias^sc is non-negligible and increases the asymptotic 
variance. This is essentially because the leading term of bias arc is proportional to the 
average of v = s — s 0 b s , the difference between the simulated and observed summary 
statistics, and if d > p, the marginal density of v is generally asymmetric, and thus has a 
non-zero mean. 

It has previously been argued that one should choose a summary statistic which has the 
same dimension as the number of parameters [17]. However that was based on controlling 
the Monte Carlo error, with for example [8] showing that the optimal rate of decreasing e 
as the Monte Carlo sampling size increases is slower for larger d. The loss of efficiency we 
observe in Theorem 3.1 (ii) for d > p gives a separate advantage for choosing a summary 
statistic with d = p. Remarkably, the following proposition shows that for any summary 
statistic of dimension d > p we can find a new p-dimensional summary statistic without 
any loss of information. 

Proposition 3.1. Assume the conditions of Theorem 3.1. If d is larger than p, let 
C = Ds(6q) t A(6 $) _1 , then Ic(6o) = I(6q) where Ic(6) is the 1(6) matrix of the summary 
statistic CS n . Therefore the asymptotic variance of Iiabc based on Cs 0 b s is smaller than 
or equal to that based on s 0 b s - 

Proof. The equality can be verified by algebra. □ 

The proposition shows that a proper linear transformation can be an effective dimension 
reduction method, when e n is small enough that the condition in Theorem 3.1 (ii) is 
satisfied. The matrix C can be interpreted as the product of the scale matrix A(6 o)~ 1//2 , 
which standardizes s 0 b s , and the matrix Ds(6q) t A(6q)~ 1 ^ 2 which can be taken as the 
‘squared-root’ of 1(6 o). 

Theorem 3.1 leads to following natural definition. 

Definition 1. Assume that the conditions of Theorem 3.1 hold. Then the asymptotic 
variance of Iiabc is 

AV hABC = ±Dh(6 0 ) T I^ BC (6 0 )Dh(6 0 ). 

a n 
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4. Asymptotic Properties of Rejection and Importance Sampling ABC. 


4.1. Asymptotic Monte Carlo Error. We now consider the Monte Carlo error involved 
in estimating /iabc- Here we fix the data and consider randomness solely in terms of 
the stochasticity of the Monte Carlo algorithm. We focus on the importance sampling 
algorithm given in the introduction. Remember that N is the Monte Carlo sample size. For 
i = 1,..., N, Oi is the proposed parameter value and Wi is its importance sampling weight. 
Let (f)i be the indicator that is 1 if and only if Oi is accepted in step 3 of algorithm 1 and 
N acc = Yli= l & be the number of accepted parameter. 

Provided N acc > 1 we can estimate h abc from the output of importance sampling 
algorithm with 

N N 

i=l 2=1 

Define 

Pacc,q — j Q(9) j fn{s\9) K £ ( y S Sobs )dst 1$. 

which is the acceptance probability of the importance sampling algorithm proposing from 
q(9). Furthermore, define 

qABc(0\s obs ,e ) oc q n (9)fABc(s 0 bs\9,£), 


the density of the accepted parameter; and 


( 2 ) 


^IS,n — Eft ABC 


(' h(0 ) - h,ABC ) 2 


^ABc(9\s obs , e n) 

OABC [9 1 Sobs ; ^n ) 


and EiABC,n — Pacc,q n 


ElS,n, 


where Eis, n is the IS variance with ttabc as the target density and qABC as the proposal 
density. Note that p aC c,q n anc i ^is,n, and hence Eabc,u, depend on s obs . 

Standard results give the following asymptotic distribution of h. 


Proposition 4.1. For a given n and s obs , if h abc and Eabc,u are finite, then 

VN(h - h,ABc) ^ N(0, Eabc,u), 

as N —> oo. 


The proposition motivates the following definition. 

Definition 2. For a given n and s obs , assume that the conditions of Proposition f.l 
hold. Then the asymptotic Monte Carlo variance of h is 

MCV- k = ±S ABC , n . 





10 


LI AND FEARNHEAD 


From Proposition 4.1, it can be seen that the asymptotic Monte Carlo variance of h is 
equal to the IS variance X/s,n divided by the average number of acceptance Np acc ^ qri , and 
therefore depends on the proposal distribution and e n through these two terms. 

4.2. Asymptotic efficiency. We have defined the asymptotic variance as n —> oo of 
h-ABC-, and the asymptotic Monte Carlo variance, as N —> oo of h. Both the error of 
h,ABC when estimating h(0 o) and the Monte Carlo error of h when estimating Habc are 
independent of each other. Thus this suggests the following definition. 


Definition 3. Assume the conditions of Theorem 3.1, and that Kabc and E ABC,n are 
bounded in probability for any n. Then the asymptotic variance of h is 

AV h = To h (Oo) J lABc(6o) D h(0 0 ) + T,ABC,n■ 
d n 1 V 


That is the asymptotic variance of h is the sum of its Monte Carlo asymptotic variance 
for estimating HabCi and the asymptotic variance of Iiabc- 

We now wish to investigate the properties of this asymptotic variance, for large but fixed 
N, as n —> oo. In particular we are interested in the ratio between AV^ andAVMLES> where, 
by Theorem 3.1, the latter is defined as a~ 2 h(0 o ) T I 1 (0o)Dh(6 o ). We will consider how 
this ratio depends on the choice of e n and q n (0). Thus we introduce the following definition: 


Definition 4. For a choice of e n and q n (6), we define the asymptotic efficiency of h 


as 


AFT = lim 


AVmles 
AVt ' 


If this limiting value is 0, we say that h is asymptotically inefficient. 


We will investigate the asymptotic efficiency of h under the assumption of Theorem 3.1 
that e n = o{l/y/af). We will see that the convergence rate of the IS variance £/s> depends 
on how large e n is, and so we further define c £ = linv^oo a n e n , assuming that this limit 
exists, and let a Ut£ = a n \ Ce<00 +£~ l \ Ce=oa . Note that c £ can be either a constant or infinity. 

First we show that if we propose from the prior or the posterior, then the ABC estimator 
is asymptotically inefficient. 


Theorem 4.1. Assume the conditions of Theorem 3.1. we have: 

(i) If q n (0) = 7 t(0), Pacc,q n = Qp^aiff) and T, IS , n = © p (o“|). 

(H) If q n (0) — EABc(.0\Sobsi £n)> Pacc,q n — and n — 0p(fln,e)’ 


In both cases h are asymptotically inefficient. 
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Note the result in part (ii) shows the difference from standard importance sampling 
settings, where using the target distribution as the proposal leads to an estimator with no 
Monte Carlo error. 

The reason why h is asymptotically inefficient is because the Monte Carlo variance decays 
more slowly than 1/a^ as n —> oo. However the problem with the Monte Carlo variance is 
caused by different factors in each case. 

To see this, consider the acceptance probability of a value of 6 and corresponding sum¬ 
mary s n simulated in one iteration of the IS-ABC algorithm. This acceptance probability 
depends on 

( 3 ) S n-So b s = 1 [{an _ s{0)) + {s{e) _ s{6o)) + {s{6o) _ Sobs)] ) 

where s(9), dehned in (C3), is the limiting values of s n as n —> oo if data is sampled 
from the model for parameter value 9. By (C3) the first and third bracketed terms within 
the square brackets on the right-hand side are O p (a~ 1 ). If we sample from the prior, then 
the middle term is O p ( 1), and thus (3) will blow-up as e n goes to 0. Hence p aC c,ir goes to 
0 as e n goes to 0 and thus causes the estimate to be inefficient. If we sample from the 
posterior, then by Theorem 3.1 we expect the middle term to also be O p (a~ 1 ). Hence (3) is 
well behaved as n —> oo, and consequently p aC c,ir is bounded away from 0, provided either 
e n = ©(a" 1 ) or £ n = f^a" 1 ). 

However, using TTABc(0\s o b s , £ n) as a proposal distribution still causes the estimate to 
be inefficient due to an increasing variance of the importance weights. As n increases the 
proposal is more and more concentrated around 6 o, while n does not change. Therefore the 
weight, which is the ratio of ttabc and (Iabc-, is increasingly skewed and causes X/s 1 ,™ to 
go to oo. 

discussed after Theorem 3.1, when e n = o(a“ 1 ), its effect on the bias is negligible. 
However, for any Monte Carlo algorithm making acceptance/rejection through K(v), the 
acceptance probability with this choice of e n goes to 0 as n —> oo. Because in (3), the 
mechanism simulating the dataset determines that s n — s 0 b s is O p {a~ l ) and hence with the 
negligible e n , (3) will blow-up, making the acceptance probability degenerate. In such a 
case, N needs to increase with n to compensate the decreasing acceptance rate. 

4.3. Efficient Proposal Distributions. Whilst using the prior and the posterior leads 
to asymptotically inefficient estimators, it will be seen that there exist practical proposal 
distributions that avoid this inefficiency. Consider proposing the parameter value from a 
location-scale family. That is our proposal is of the form a n T: i,2 X + p n . where X ~ (/(•), 
E[X] = 0 and Var[X] = I p . This defines a general form of proposal density, where the 
center, p n . the scale rate, cr n , the scale matrix, X and the base density, q(-), all need to be 
specified. We will give conditions under which such a proposal density results in estimators 
that are not inefficient. 

Our results are based on an expansion of TTABc{9\s 0 b s , £ n )> obtained from the proof of 
Lemma 6 in the Appendix. Consider the rescaled random variables t = a n)E (0 — 0 q) and 
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v = e n 1 (s — s 0 b s ). Let T a b s = a n A(0o) l ^ 2 {s 0 b s — s(0q)). Define a joint density of t and v 
as the following, 


g n (t,v; t) oc 


n((Ds(0 o) + T)t;a n e n v + A(0o) 1 ^ 2 T o b s ,A(0o)^K(v), when C^Tl^Tl ^ C OO^ 
n((Ds( 0 o ) + r)t;t> + ^A^o) 1 / 2 ^, ^ A(0 O )^ A'(v), when a n e n -A oo, 


and g n (t-,T ) = J g n (t,v;T)dv. These are defined so that for large n and for the rescaled 
variables, the leading term of ttabc is proportional to g n (t ; 0). Note that this is a continuous 
mixture of normal density with the kernel density as the weights. 

The main theorem here requires two conditions of the proposal density. First, we need 
that the density for the scaled random variables, t , to be proper, which requires a n = a~\ 
and is O p ( 1). This ensures that the acceptance probability is bounded away from 0. Second 
we need the importance ratio between the target and the proposal densities satisfies the 
following: 

(CIO) 3a E (0,1) and a small enough 6 > 0 such that, for any p, bounded in probability, 

9n{t]T) a 

S U P , v _ 1/2 ,. -yv = Op( 1 ). 

tmp,T T T<8i p <?(£ ' (t - n)) 


if we further choose e n = 0(a n 1 ), the Monte Carlo IS variance for the accepted parameter 
values is 0(a~ 2 ), and has the same order as the variance of MLES. 


Theorem 4.2. Assume the conditions of Theorem 3.1. If 

QnW = /M«) + (i -ffl;pp9K lE " 1/2 (« -/*»)). 

where /? E (0,1), q(-) and X satisfy (CIO), a n = a~\ and it holds that p acc ,q n = ©p( £ n a ne) 
and Tjs.n = O p (a~ 2 ). Then if e n = ©(a” 1 ), AE^ = © p (l). 

Furthermore, if d = p, AE^ = 1 — K/(N + K ) for some constant K. 

The mixture with tt(0) here is to control the importance weight in the tail area, similiarly 
to the defensive importance sampling of [21]. It is not clear whether this is needed in 
practice, or is just a consequence of the approach taken in the proof. 

The above result also shows that with a good proposal distribution, if the acceptance 
probability is bounded away from 0 as n increases, the threshold e n will have the preferred 
rate ©(a^ 1 ). This supports the intuitive idea of using the acceptance rate in ABC to choose 
the threshold based on aiming for an appropriate proportion of acceptances [e.g. 15, 6 ]. 

In practice, obviously a n and \i n need to be adaptive to the observations since they 
depend on n. For q(-) and X, the following proposition gives a practical suggestion satisfying 
(CIO). Let T(-;y) be the multivariate t density with degree of freedom 7 . 
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Algorithm 2: Iterative Importance Sampling ABC 

Choose a small mixture weight (3 , and a sequence of acceptance rates {pk}- Choose a location-scale family 
(such as a t-distribution). Let qo be the density from this family that has the same mean and variance as the 
prior. 

At the kth step: 

1. Run IS-ABC with simulation size No, proposal density f3n(0) + (1 — /3)qk(0) and acceptance rate pk, anc 
record 

the bandwidth £/-. 

2. If £k—i — is smaller than some positive threshold, stop. Otherwise, let p*k+i an( i ^fc+i be the 
empirical mean and variance matrix of the weighted sample from step 1, and let qk+i{0) he the density 
with centre p>k+i an d variance matrix 2Efc+i. 

3. If qk{0) is close to qk+l(Q)-> stop. Otherwise, return to step 1. 

After the iteration stops at the K t h step, run the IS-ABC with the proposal density /3tt(0) + (1 — ft)qK+i(0)i 
N — KNq simulations and Pk+ l- 


Proposition 4.2. If 3 70 > 0 and a G (0,1) such that K(v) a /T(v; 70 ) < M for some 
constant M, then (CIO) is satisfied for q(Q) = T(0\ 7 ), where 7 < 70 and any E. 

The above result says that it is theoretically valid to choose any E if a t distribution 
with any choice of 7 is chosen as the base density, providing the kernel has lighter tails 
than the t-distribution. 

4.4. Iterative Importance Sampling ABC. Taken together, Theorem 4.2 and Proposi¬ 
tion 4.2 suggest proposing from the mixture of tt(6) and a t distribution with the scale 
matrix and center approximating those of ttabc(®)- We suggest using an iterative proce¬ 
dure [similar in spirit to that of 4], see Algorithm 2. 

In this algorithm, N is the number of simulations allowed by the computing budget, 
No < N and {p^} is a sequence of acceptance rate, which we use to choose the bandwidth. 
The rule for choosing the new proposal distribution is based on approximating the mean 
and variance of the density proportional to Tr(0)fABc(sobs\0, e) 1 ' 2 , which is optimal in the 
sense of maximising the ESS of importance sampling [17]. It can be shown that these 
two moments are approximately equal to the mean and twice the variance of vr abc(@) 
respectively, the mixture weight, /3, we suggest using 0.05. Since Algorithm 2 has the 
same simulation size as the rejection ABC and the additional calculations have negligible 
computational cost, the iterative procedure does not introduce additional computational 

5. Numerical Examples. 

5.1. Gaussian Likelihood with Sample Quantiles. This examples illustrates the results 
in Section 3 with an analytically tractable problem. Assume the observations Y 0 b s = 
(yi, ■ ■ ■ ,y n ) follow the univariate normal distribution N(fj,,a) with true parameter val¬ 
ues (l,\/2). Consider estimating the unknown parameter (/r, a) with the uniform prior in 
the area [—10,10] x [—10,10] using Algorithm 1. The summary statistic implemented in 
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algorithm 1 is (e^“i/ 2 , • • • , e^/ 2 ) where q a is the sample quantile of Y 0 h s for probability 
a. Since the likelihood function and asymptotic distribution of the summary statistic are 
analytically available [31], the theoretical results in Theorem 3.1 and Proposition 3.1 may 
be verified. This summary statistic is illuminating because it is easy to change the infor¬ 
mation contained by changing the number of quantiles and it avoids the trivial case that 
s(9) is a linear function of 8. 

The results for data size n = 10 5 are presented. Smaller sizes from 10 2 to 10 4 also have 
been tested, and all show similar patterns. The probabilities cci,--- , for calculating 
quantiles are selected with equal intervals in (0,1), and d = 2,4,9 and 19 are tested. 
In order to investigate the Monte Carlo error-free performance, N is chosen to be large 
enough. The performance of 9abc> MLES and MLE are compared. Since the dimension 
reduction matrix C in Proposition 3.1 can be obtained analytically, the performance of 
9 abc using the original d-dimension summary is compared with that using the 2-dimension 
summary.results of mean square error(MSE) are presented in Figure 1. 

The phenomena implied by Theorem 3.1 and Proposition 3.1 can be seen in this exam¬ 
ple, together with the limitations of these results. First, E[/i(0)|s o i, s ], equivalent to 9 abc 
with small enough e. and MLES have the same performance. Second, the three stages of 
increasing e can be seen in all graphs. When e is small, the MSEs of 8abc achieve those 
of MLES. Then when e becomes larger, for d > 2 the MSEs quickly increase to be signif¬ 
icantly larger than those of the MLES, while for d = 2 there are no such obvious gaps. 
This corresponds to the ‘well-behaved’ e. Then the increasing rates of MSEs become larger 
as the e increases and becoming more and more ‘dominating’. 

Third, for all cases, the 2-dimension summary give the same performance of 9 abc as the 
MLES for small e, indicating that it contains the same information as the original sum¬ 
mary. However, it be seen that for larger e, the performance of the reduced-dimension sum¬ 
maries are not stable, and are in fact worse than the original summaries for estimating y, 
although better for estimating a. The worse/better performance are caused larger/smaller 
bias of 9 abc- This is due to the second order behaviour of 9 abc, which becomes important 
for larger e.This suggests using other techniques for reducing the bias, e.g. the regression 
adjustment, together with the dimension-reduction matrix for more stable behaviour. 

5.2. Stochastic Volatility with AR(1) Dynamics. Consider the stochastic volatility model 
in [32] 

jx n = (j)X n -i + T] n , rj n ~ N( 0, o- 2 ) 

\y n =ae^i n , £ n ~ IV(0,1), 

where rj n and £ n are independent, y n is the demeaned return of a portfolio obtained by 
subtracting the average of all returns from the actual return and <r is the average volatility 
level. By the transformation y* n = logy 2 and £* = log£ 2 , the state-space model can be 
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Fig 2. Comparisons of R-ABC and IIS-ABC with two implementation for increasing n. For IlS-mix, the 
estimation stage uses the mixture as the proposal distribution, and for IlS-t, t distribution only. For each 
n, the logarithm of average MSE for 100 datasets multiplying by n is reported. For each dataset, the Monte 
Carlo sample size of ABC estimators is 10 4 . The ratio of the MSEs of the two methods is given in the table, 
and smaller values indicate better performance of the IIS-ABC. 


transformed to 


(4) 


X n = fan- 1 + In , Bn ~ N( 0, (7^) 

V*n = 2 lo g ° + Xn + Q, ex P{CI ~ Xl, 


which is linear and non-Gaussian. 

The ABC method can be used to obtain an off-line estimator for the unknown param¬ 
eter of the state-space models, which is recently discussed by [24], Here we illustrate the 
effectiveness of iteratively choosing the importance proposal for large n by comparing the 
performance of the rejection ABC (R-ABC) and the iterative IS-ABC. In the iterative algo¬ 
rithm, t distribution with degree of freedom 5 is used to construct q In order to see whether 
it is necessary to bound the skewed importance weights using mixture, we implement the 
final estimation using two proposal distributions, the mixture (3tt(0) + (1 — (3)qK+i(0) and 
Qk+i(0) only. 
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Consider the estimation of the parameter ((f). a„, log cf) with the uniform prior in the 
area [0,1) x [0.1,3] x [-10,-1]. The setting with the true parameter (cj), cr^, log cf) = 
(0.9,0.675,-4.1) is studied, which is motivated by the empirical studies For any dataset 
Y = (yi, • ■ ■ ,yn)jletY* = , y* n ). The summary statistic s n (Y) = (Var[Y*], Cor[Y*], E[Y*]) 

is used, where Var , Cor and E denote the empirical variance, lag-1 autocorrelation and 
mean. If there were no noise in the state equation for £* in (4), then s n (Y) would be a suf¬ 
ficient statistic of Y*. and hence is a natural choice for the summary statistic. The uniform 
kernel is used in the accept-reject step of ABC. 

The data length n = 100, 500, 2000 and 10000 are tested with the simulation budget 
N = 10000. For the IIS-ABC, the sequence {pk} has the first five values being 5% to 1%, 
decreasing by 1%, and the other values being 1%. For R-ABC, both 5% and 1% quantiles 
are tried and 5% is chosen for its better performance. For each iteration, Nq = 1000. The 
simulation results are shown in Figure 2. 

It can be seen that for all parameters, the IIS-ABC shows increasing advantage over the 
R-ABC as n increases. For larger n, the iterative procedure obtains the center of proposals 
closer to the true parameter and the bandwidth smaller than those used in the R-ABC, and 
the comparison becomes more significant when n increases. These contribute to the more 
accurate ABC estimators. For smaller n, both perform similarly, since when the summary 
statistic is not accurate enough, the ABC posterior is not much different from the prior, and 
the benefit of sampling from a slightly better proposal does not compensate the increased 
Monte Carlo variance from the importance weight. It is relatively easier to estimate logit, 
since the summary statistic E\Y*] is centered at a linear function of logd, and therefore 
IIS-ABC does not show as much advantage over R-IIS as estimating 4> and a v . Finally, the 
performance both with and without the mixture for the proposal density are similar. 

6. Summary and Discussion. The results in this paper suggest that ABC can scale 
to large data, at least for models with a fixed number of parameters. Under the assumption 
that the summary statistics obey a central limit theorem (as defined in Condition C3), then 
we have that asymptotically the ABC posterior mean of a function of the parameters is 
normally distributed about the true value of that function. The asymptotic variance of the 
estimator is equal to the asymptotic variance of the MLE for the function give the summary 
statistic. And without loss of asymptotic efficiency we can always use a summary statistic 
that has the same dimension as the number of parameters. This is a stronger result than 
that of [17], where they show that choosing the same number of summaries as parameters 
is optimal when interest is in estimating just the parameters. 

We have further shown that appropriate importance sampling implementations of ABC 
are efficient, in the sense of increasing the asymptotic variance of our estimator by a factor 
that is just 0{1/N). However similar results are likely to apply to SMC and MCMC im¬ 
plementations of ABC. For example ABC-MCMC will be efficient provided the acceptance 
probability does not degenerate to 0 as n increases. However at stationarity, ABC-MCMC 
will propose parameter values from a distribution close to the ABC posterior density, and 
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Theorems 5.1 and 5.2 suggest that for such a proposal distribution the acceptance proba¬ 
bility of ABC will be bounded away from 0. 

Whilst our theoretical results suggest that point estimates based on the ABC posterior 
have good properties, they do not suggest that the ABC posterior is a good approximation 
to the true posterior, nor that the ABC posterior will accurately quantify the uncertainty 
in estimates. It can be shown from a simple Gaussian example that the ABC posterior will 
tend to over-estimate the uncertainty. 
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iting behaviour when n goes to oo. For two sets A and B, the sum of integrals f A f(x ) dx + 
f B f(x) dx is written as (f A + f B )f(x ) dx. Recall that T 0 b s = a n A(0o)^ 1 ^ 2 ( K s o b s — s(0q)) and 

by (C3), T 0 b s —> N( 0, Id) where Id is the identity matrix with dimension d. For a vector v , 
denote any polynomial of the elements of v with order up to l by Pi(v). We say a square 
matrix A is bounded if there exists constants c\ and C2 such that c\ < A < C2 and a 
rectangular matrix B is bounded if B T B is bounded. Let \ m i n (A ) = c\ and X m ax(A) = c 2 . 

APPENDIX A: PROOF OF SECTION 3 

The proof of Theorem 3.1 proceeds as follows. The convergence of the MLES is given by 
Lemma 1 and Lemma 2. For the convergence of the posterior and ABC posterior means, 
divide R p into Bs = {6 : \\9-Oo\\ <5} and for some <5 < do- First, in B$, Lemma 
3 shows that the integration is ignorable. In Bs, the integral can be approximated by 
replacing f n (s 0 b s \0) with a normal density, suggested by (7). Then the expansions of the 
mean and the normalising constant of ABC posterior density, based on the analytical form 
of the normal density, are given in Lemma 6, and the vanishing of the remainder terms are 
supported by Lemma 4. Finally, the asymptotic distributions of the leading terms of the 
expansion are obtained according to Lemma 5, which concludes the proof. 

For MLES, [12] gives the central limit theorem for $mles when a n = y/n and T is 
compact. According to the proof in [12], extending the result to the general a n is straight¬ 
forward. Additionally, we give the extension for general T. 

Lemma 1. Assume conditions (C1),(C3)-(C5). Then it holds that a n {0 mles ~ @o) —> 
N(0, / _1 (#o)) as n —> 00 . 

Given the condition (C7), by Lemma 1 and the delta method, the convergence of MLES 
for general h(0) holds as the following. 

Lemma 2. Assume the conditions of Lemma 1 and ( C7). Then 

a n {h(Q mles) - h(6 0 )) IV(0, Dh(6 0 ) T / _1 (6 0 )Dh(6 0 )) as n -A 00 . 

Now consider the integral n(h) = f h(0)Tr{0) f ABc{ s obs\0) dff. Under this notation, Habc = 
7r(/i)/7r(l). For some 5 < do, decompose 7 r(h) into two parts, including 

7T B s (h)= [ h(0)ir(0)fABc(s o bs\0) d0 and Tr B Ah) = f h(0)ir(0)f A Bc{Sobs\O) d0. 

J Bs Jbi 

First of all, the following lemma shows that for a fixed <5, the integral in B$ can be ignored. 

Lemma 3. Assume conditions (C2)(iii), (Cf), (C5) and (C8). ThenMd > 0, 1 TB°(h) = 

Otfi 

O p (e~ an ’ eCs ) for some positive constants cs and as depending on 5. 
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a 5 

Proof. It is sufficient to show that supg gB c fABc(s o bs\0) = O p (e~ an ’ sCs ). Let Mg = 
min (AT ,5). By dividing into {v : ||e n u|| < Ms/ 2} and its complement, we have 


sup / fn{Sobs + £ n v\0)K(v) dv 

6eB%J Rrf 

< sup sup fn(s\6) + sup sup f n (s\0) + K {e~ l Ms/2)e~ d . 

6»eB|\yg \\s~s obs \\<M s /2 £>eyg \\s-s oba \\<M s /2 

In the above, as n —> oo, both the second and the third terms are exponentially decreasing 
by (C5) and (C2)(iii) respectively. For 9 E S|\1Pq, when ||s — s 0 6s|| < Ms/ 2, ||W n (s)|| > a n 5r 
for some constant r. Since fw n (w\0) is bounded by the sum of a normal density and r max (i «), 
sup 0gB c\yc sup|j s _ Sobs || <Mi / 2 fn(s\9 ) is also exponentially decreasing. Finally, the sum of all 
the above is 0(—a^ s £ cg) by noting that a n ^ £ < min(e“ 1 ,a n ). □ 


Then we only need to consider the integration in Bg. Let t{0) be the rescaled random 
vector a n ^(0 — 0o) and t(Bs) be the transformed Bg under t(6). This rescaling is useful in 
the following Taylor expansion, 


(5) 


*B s {h) 


h{0o) + a n } e Dh{6 0 ) T 


KBsitjO)) 

W1) 


1 _ 2 7 T Bs {t{0) T Hh(0 t )t(0)) 

2 n ’ £ *B t (l) 


where 0t E Bg. Let r n (s\0) be the scaled remainder a n [f n (s\0) — f n (s\0)}. Let fABc($obs\0 ) = 
f fn(s obs +£ n v\0)K(v) dv and n Bs (h) = Ib s h{O)ir(0)f A Bc(s o bs\0) dO. Intuitively n Bs (h) can 
be approximated by -irB s (h) if their difference is small, written as the following 


ir Bs (Pi(t)) - TT Bs (Pi(t)) = a. 


-i 

n 


lt(B s ) , 


Pi(t)Ti(0 o + a n£ t)r n (s obs + £ n v\0 o + a n £ t)K(v) dvdt, 


For now we claim that 

^ ft( Bs )fPi(i’ v ) 7 r(0o + a-'t)r n (s obs + £ n v\0 o + a~*t)K(v) dvdt _ ^ ^ 

Ib s 7T(0O T 0>n,et) fABc(Sobs\@0 T a n ^ e t) dt 
and leave the proof to Lemma 7. Then it implies the expansion 

(7) tt s ,( 1 ) = n Bs (l)(l + Op{a~ 1 )) and 7TB ^ P ^ = 7T ^ P ^)1 + o p {a ~ 1 ). 

The following two lemmas are given to analyse the convolutions involved in vr Bg (Pi(t)). 


Lemma 4. Assume condition (C2). For t E MP, let A(t) be a d X p matrix function. 
Let c be a constant vector, {k n } be a series converging to c\ E (0, oo], and {b n } be a series 
converging to a non-negative constant. Let B n = l{ Cl=0 o} + ^nl{ci<oo}- Assume A(t) is 
uniformly bounded in M p . For a density g(v) in M d , if it satisfies 
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(i) g(v) = <?(||v||) and g(v) is a decreasing function o/||u||, and 
(**) I n[ + = P i Vi k g(y) dv < oo for any coordinates (vi 17 ■ ■ ■ , v'i l ) of v for some integer l, 

then 


Pi(t,v)k^g(k n [A(t)t - B n v 


k n x c\)K(v) dvdt = 0(1), 


and 


k^g(k n [A(t)t — B n v — k n 1 c\)K(v) dvdtand = 0(1). 


PROOF. Note that K(y) satisfies (i) and (ii). When c\ < oo, assume k n = 1 without 
loss of generality. Divide R p into V = {t : ||A(t)t||/2 > ||6 n v + c||} and V c . In V, || A(t)t — 
b n v — c|| > \\A(t)t\\/2] in V c , ||t|| < 2A m j n (^) _1 + c||. Note that Pi(t,v ) < f^(||t||, ||t>||) 

by Cauchy-Schwarz inequality. Then 


Pi(t,v)g{A(t)t - b n 


c)K(v) dvdt 


< [ [ Pi{t,v)g(\max{A)\\t\\/2)K(v)dvdt 

Jrp Jl R d 


+ c 2 sup g(v) 

v£R d 


Pi(2X min (A ) 1 \\b n v + c\\,v)\\b n v + c\\ p K(y) dvdt, 


= 0 ( 1 ), 


where c 2 is some constant. 

When ci = oo, let v* = k n ( A(t)t — v — k~ l c). Then 


/ / Pi(t,v)k^g(k n [A(t)t -v-k n 1 c])K(v) dvdt 

Jrp JR d 

■ f f Pi(t,v*)K(A(t)t — k~ l c — k~ 1 v*)g(v*) dv*dt : 

Jrp Jl R d 


which is 0(1) following the previous arguments. 

For Pi(t,v) = 1, by considering only the integral in a compact region, it is easy to see 
the target integral is larger than 0. Therefore the lemma holds. 

n 


Lemma 5. Consider the notations and assumption of Lemma f. Then 


N(At;B n v + Pc,^I d )K(v) 1 

/-—- clt civ — — 

f N(At; B n v + jwC, ^rId)K(v) dtdv k n 


{A t A) 1 A T c + r{c;A,B n ,k n )l {d>p} 


It holds that (i) r(c; A, B n , k n ) = 0 if d = p, and 0(1) if d > p, and o(l) if B n = o(l). (ii) 
r(0; A, B, k n ) = 0. 
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The explicit expression of r(c; A, B n , k n ) is tedious and stated in the supplementary 
material. 

Let g n (t, v) = g n (t,v ; 0), where g n (t,v, r) is defined in Section 4.3, and for a d x d matrix 
72, let g n (t, v; n, 72 ) be defined by replacing A(6q) hr g n (t , v, n) by (A(0o) -1 / 2 + i"2) -2 - The 
following lemma gives expansions for the leading term of the normalising constant 7r(l) 
and the posterior mean n(h)/ir(l), by showing that the leading term of ^{0)fARc{ s obsW) 
is proportional to g n (t,v). 


Lemma 6 . Assume conditions (C2)(ii), (C3), (C6) and (Cl). If e n = o(l/y/afO, then 
it holds that 

( 8 ) 


KBs (!) = On/ / 9n(t,v ) dtdv + O p (o n ^) + O p (a 2 n £ A n ) + OpCan 1 ) 

L Jt(Bg)x R d 




9n{t,v) dtdv = 0p(l), 


(9) 

and 


*B s (h ) " N , __i 


7TB, ( 1 ) 


= d(0 o ) + a„ e Dh{9 0 ) 


It(B s )xR d t9n(t,V) dtdv 
_ It(B s )xR d ffn(t,V ) dtdv 


+ Op(<Ve) + Op(« 2 e^,) + Opta^ 1 ) 


where t(Bg ) is the transformed B§ under t(0) = a n>£ (0 — 0 q). 


Proof. First consider 71^(1). By ( 7 ), it only needs to evaluate ttb s (!)• With the trans- 

ir(6 0 + a~\t)J n (s obs + £ n v\6 0 + a~\t)I<{v) dvdt 


formation t = t(0), 

(10) n Bs (l ) = a~* 


>t(B s )x I 

Since f n (s obs +£ n v\0o+a~ 1 £ t ) is analytically available, we can obtain an expansion of ttb s (1) 
by expanding f n (s obs + £ n v\0o + a~\t) as follows. The expansion needs to be discussed in 
two cases as the limit of a n e n being finite or infinite. 

When a n E n — > c £ < 00 , a Ut£ = a n . Applying the Taylor expansion twice on the exponen¬ 
tial term of f n {s obs + £ n v\0Q + a~( £ t), both on a" 1 , gives that 

fn(&obs T |0o T Ojj t)AT(y) 

= 1 IdnitiV) + a- 1 P 3 (t,v)g n (t,v;r 1 (^- ),r 2 (-t)) 

\A{0 O + a n i)| 1/2 L a n a n 

where = ^ria" 1 D t>2 (t, ei(t)) for some |ri| < 1, r 2 (^) = r 2 a^ 1 D A ,i{t, e 2 {t)) for 

some |r 2 | < 1, D< j2 (t, ei(t)) is the d-dimension vector with the ith element t 7 Hsi{0o+e\{t))t 
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for some |ei(t)| < 6, D Ayl (t,e 2 (t)) = Yfk=i + £2 (*)) 1/2 4 for some |e 2 (t)| < 6, and 

the coefficients of P 3 (t,v) are O p { 1). It can be seen that r\(t/a n ) and r 2 (t/a n ) are linear 
functions of t/a n with coefficients being O p { 1) and no constant terms, therefore can be 
arbitrarily small with small enough 5 since |t|/a n < S. 

When a n e n -A oo, a n , £ = e~ x . Letv* = A(9 0 ) 1/2 T obs +a n £ n v-a n £ n DS(0 0 )t, g*(t,v*;r 3 (£ n t),r 4 (£ n t)) 
be g n (t.v\ r 3 (e n t), r 4 (e n i)) transformed by v* and g^{t,v*) = g*(t,v*;0,0). Applying the 
Taylor expansion twice on the exponential term, firstly on e n and secondly on a n £ gives 
that 


fn(s obs + e n v\9 0 + £ n t)K(v ) 


,-d 


\a(« o)i 1/2 r 




1 


P 2 (t>>* +-Pi(t)v*t; 




CL-nSr 


9n(t,V* 


+{a n £ 2 n ) 2 P4(t,v*)g*(t,v*;r 3 (£ n t),r A (£ n t))} 


where r 3 (£ n t)t = |£„A, 2 (t, £i(t)) for some |r 3 | < 1 and r 4 (e n i) = r 4 £ n D AA (t,e 2 (t)) for 
some |r 4 | < 1. Similarly, r 3 (£ n t ) and r 4 (e n t) can be arbitrarily small with small enough 5 
since £ n t < 5. 

Then plugging the above expansions and the following Taylor expansion 

?r(0o + a~l e t) 7r(0 o ) , _i n 7r(0 o + e3(*)) * u ^ x 

— +a nF D— —— : - / 0 t, where |e 3 (t)| < o, 


\A(9 0 + \t)\W \A(0 q)\ 1 / 2 n ’ £ \A(9 0 + e 3 m 1/2 

into the expression ( 10 ) of 775 , 5 ( 1 ), it can be expanded as 

<^ b 5 (!) ~ it ( 0 o ) 


=a 


,-1 

n,e 


' g n (t,v)dtdv 

t(Bg)x R d 

\A(9q)\ 1 / 2 D + lo tg n (t,v) dtdv 


T CL, 

+ £n 
+ £n 


-1 


((BsJxR ' 1 \A(9 0 + e 3 (t))\ 1/2 

P 4 (t,v)g n (t,v;r 1 (a~^t),r 2 (a~ 2 t))dtdvl {ane=an} 

2 ..* / 


d^<5) x ® 

P 3 (t) / (a n£ „)V^(i,^)^dil {an£=£ - 1} 

/ TTD <7. L ) J 


't(B s ) 


P 2 (t)v*v* T g* n (t,v*) dtdv* 1 




{®n,e—£71 } 


(ID 


d" 


't(Bs)xl 


P 5 (t,v*)g* n (t,v*;r 3 (a 2 t),r A (a 2 t)) dtdv* l 


{&n,£—£n } 


In the five terms in the RHS of above, the first two terms are 0 p (a“)-) and O p (a~ 1 ) and 
the fifth term is 0 p (a^e^) by Lemma 4. The third term is O p (£ n ) by noting that let 
e k = ( 0 , • • • , 1 , ■ ■ ■ , 0 ) with 1 at the kth coordinate, 

roo POO 

/ v* k g* n (t,v*) dv* k = / v* k [g* n {t,v*) - g* n (t,v* - 2 v* k e k )\ dv* k 
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is bounded by (a n e n ) 2 c due to the symmetry of N(v *; 0, Id). The fourth term is obviously 
Op(s n ). Then since L B s)xR d g n (t,v) dtdv = 0 P (1) by Lemma 4, ( 8 ) holds. 

Now consider TTB s (h)/iTB s ( 1). By (5) and (7), we have 

( 12 ) 


n B s (h) _ . , i 


71'3,(1) 


= h{0 o )+a Dh(0 0 ) 




+ O p (a n ) 


1 -2 

+ 2 a n,e 


TTBsitjO) 1 Hh(0 t )t{0 )) 

*B S 0) 


+ Op( a n ) 


Since Q 2 {t) is a polynomial with bounded coefficients, where Q 2 (t) = t or tHh(0t)t, 

KBsiQzit)) can be expanded similarly as for ttb s ( 1)> simply multiplying g n (t,v ) and P±{t,v) 

in (11) with Q 2 (t). Then it holds that t^b s {Q 2 ^)) = ai~ e p ir(0 o ) f t ( Bs ) X R d Q 2 (t)g n (t,v) dtdv+ 

O p ( a ~l) + O p {a^£n) , and (9) holds by plugging this into the RHS of (12). Therefore the 
lemma holds. □ 

After obtaining the order of ttb s (1), ( 6 ) can be proved. 

Lemma 7. Assume conditions (C2) and (Cf). Then if e n = o(l/y/aff), (6) holds. 

Proof. Letr Wn (w\0) = oi n [f Wn (w\0)-f Wn (w\0)], and we have r n (s|0) = a n \A(0)\- l / 2 r Wn {a n A{0)~ 1/2 {s- 
s(0))|0). Then since ir(0) and A(0) are bounded for 6 E Bs, it is sufficient to show that 


>KB S ). 


Pl{t. v)(a n d ne ) T max {a n a ne [Ds(0Q-\-ett)t a n ^ e E n v a n Oi n£ A{0Q) PT obs ])K(v) dvdt — O p (l), 


where the scalar et satisfying |e*| < a n \ and is from the Taylor expasion s(0q + a„|.f) = 
s(0 o) + a~\Ds(0 o + ett)t. Note that when c £ < oo, a n a~\ = 1 and a n ^e n —> c £ ; when 

□ 


c £ — oo, a n a n£ 


0 and a n>£ e n = 1. Therefore by Lemma 4, the lemma holds. 


Proof. [Proof of Theorem 3.1] By applying Lemma 5 to the integral in (9), where the 
notations in Lemma 5 corresponds to 


A = A(0 o )~ l / 2 DS(0 o ), Bn = 


(a n £ n A(6 0 ) l/2 , c £ < oo 

U(*o) 


- 1/2 


) kn — 


Cp = OO 


1 , 

Ojn.E- r 


C E < OO 


Cp = oo 


and c = T 


obsi 


it can be seen that the integral has the order Q p (a n ^/a n ). Since e n = o{a n 7 ) and a~ l = 
o(a n 2 ^ 5 ), the other remainders are dominated. Then by Lemma 3 and Lemma 6 , the leading 
term of a n (h,ABC ~ h(0 0 )) is 

(13) Dh(0 o ) T ^Ds(0o) T A(d o )- 1 Ds(0o)y 1 Ds(0 o ) T A(e o )- 1 / 2 T obs + r n (T obs ), 

where r n {T obs ) = r(T obs \ A(0 o y 1 / 2 Ds{0 o ),a nt£ £ n A(0 o )~ 1/2 ,a n a~^ £ ) and r(c;A, B,k n ) is 
defined in Lemma 5. r n (T obs ) can be interpreted as the extra variation brought by e n , i.e. 


(14) 


r n(Tobs ) — Oj n {l%ABC E[/l($)|s 0 ft s ]), 
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since when e n = 0, h AB c = E[/i(0)|s o6s ] and r n (T obs ) = 0. 

By delta method, (13) is asymptotically normal with mean r n (0) and some covariance 
matrix, denoted by Iabc(6 o)- Since r n (0) = 0, the asymptotic normality in (ii) holds. 
When d = p, since r n (T obs ) = 0, Iabc(0q) = I(6o)- When d > p, if e n = o(l/a n ), 
r n (T obs ) = o p ( 1) and Iabc(6 o) = 1(6 o) holds; if a n e n -A c £n > 0, r n (T obs ) can not be 
ignored and Iabc(6 o) is not necessarily equal to 1(6 o). Since C 1 (6 o) is the Cramer-Rao 
lower bound, I A bc(6o) < I(9o)- 

For (i), the asymptotic normality holds for h(0) by Lemma 2. □ 


APPENDIX B: PROOF OF SECTION 4 

The proof of Proposition 4.1 follows the standard asymptotic argument of importance 
sampling. For the detailed proof, see the supplement material. 

For simplicity, consider one-dimension h(0). Denote ( h(6)—h A BC ) 2 by G n (6). In Theorem 
4.1 (i), Y>is,n is just the ABC posterior variance of h(6), and the derivation of its order is 
similar to that of h A BC i n Appendix A. The result is stated in the following lemma. 


Lemma 8. Assume the conditions of Lemma 3. Then Var 7rABC [h(0)\ = O p (a n 2 e ). 


PROOF. Using the notations of Appendix A, Var 7TABC [h(6)} = 7r(G n )/7r(l). It follows 
immediately from the arguments of Lemma 3 that VarTr ABC [h(6)] = TTB s (G n ) / 7Tb i5 (1)(1 + 
o p (l)). For its leading term, under the transformation t = t(0), Taylor expansion of h(6o + 
a n^ e t) on gives that 


(15) 

KB s (Gn) 

W 1 ) 


= G n (6 o) + 2a n e (h(0o) - h AB c) 


ir Bs (Dh(0 t ) T t) _ 2 7 T Bs (t T Dh(0 t )Dh(6 t ) T t(0)) 


W 1 ) 


+ a. 


n,e 


kb 6 ( 1 ) 


where 9 t E B b . In the above decomposition, G n (0 o) and a n ^ £ (h(0 o) — h AB c ) are O p (a n 2 ) 
by Theorem 3.1. Then the lemma holds by the similar argument for (5). 

□ 


PROOF. [Proof of Theorem 4.1] For (i), since p a cc,ir = £^7r(l), by Lemmas 3, 4 and 6, 
Pace, it = @p(£na„T p ) holds. Together by Lemma 8, (i) holds. 

For (ii), if p aC c,q = Qp(^n a ne) bolds, then by an alternative expression of T,ABC,n 


(16) 


T ARC,n = P 


-1 

acc,n 


E, 


^ ABC 


im ~ hABc)2 m 


which can be verified easily by algebra, the order of T,ig,n is obvious. Similar to the expan¬ 
sion of 7r(l), 








ABC ASYMPTOTICS 


27 


Pacc,q — ^ABc(fl\&obsi ^n) f ABc(&obs\0') d0 

an P eL B) TT{0 0 + a-lt)J A Bc{Sobs\Oo + a-l e t) 2 dt 

= 4 -—-~ 7TT-(1 + Op(l)). 

TbJI) 

The numerator of the above differs from tt Bs (1) by the square power of fABc{s o bs\0oPa~^t). 
If the numerator has the order 0 p (a 2rf e ), then p aC c,q = ®p(4 a n,e) would hold. By plugging 
in the expansions of f n {s 0 bs + £nV\Oo + a~\t) in the proof of Lemma 6, an expansion of 
the numerator similar to (11) can be obtained, and the leading term would be of the order 
®p(°nfe) ^ the followings hold, 


it(B s ) \JR d 


g n (t,v)dv ) dt is 0 P (1), / t ( / g n (t,v)dv) dt 

J Jt(Bs) \J R d / 


(17) and 


lt(B s ) \JR d 


P 4 (t.v)g n (t,v: r XA {a n ^t),r2A{a4 E t)) dv dt are O p ( 1) 


To show that all the above integrals are O p ( 1), we only need that f t ( Bs ) (/ K d P^it, v)g n {t , v) dv ) 2 
O p ( 1). For the third integral in the above, its proof is similar by using the technique in 
Lemma ??. For the simplicity of presentation, assume A(0o) = Id without loss of generality. 

The following arguments give the upper bound of f Rd P 4 (t. v)g n (t. v) dv. Consider the two 
cases of g n (t,v). When a n e n -A c < oo, let E\ = {v \ ||a ra £ n v|| 2 < ||_Ds(#o)£ -Tabs f/2}. 

Then we have 


P 4 (t,v)g n (t,v) dv 



<Piit) 


When a n e 


+ [ ) P 4 {t,v) 1 , exp{-h\Ds(0 o )t - T obs - a n s n v\\ 2 }K(v) dv 

JE\) (27 T) d / z 2 

n~> OO, let E 2 = {v ■- 11 1 > | j 2 < \\Ds(0o)t - (a n e n y 1 T obs || 2 /2}. Then we have 


Pi(t,v)g n {t,v) dv 


ie 2 


<P4(t) 


+ Q Ptit ' v, w 4 exp{ ~ 


\\Ds(0 o )t - —T obs - vf}K(v) dv 
—T obs || 2 } + K(h\Ds(0 o )t - —T obs || 2 ) 

^ Q j n£n 


















28 


LI AND FEARNHEAD 


In both cases, by the inequality (a + b ) 2 < 2(a 2 + b 2 ) and plugging in the upper bound, it 
is easy to see that f t ( Bs) (f K d Pi(t,v)g n (t,v) dv ) 2 dt is O p ( 1). 

To see that the limit of f t ( Bs ) (f K d g n (t,v) dv) 2 dt is lower bounded away from 0, just 
using the positivity of the limit of the integrand and Fatou’s lemma. □ 


Now let w n {9 ) be the importance weight n(9)/q n (0), n Bs: is(h) = f Bs h(9)ir(0)f A Bc{sobs\O)w n (6) dO 
and TtB?,ls(h) correspondingly. Then by the expression (16), 


(18) 


U ABC,n = P 


-1 

acc,n 


KB s ,Is{Gn) + ^B^,Is{G n ) 

7TB,(1) +7T B c(1) 


Proof. [Proof of Theorem 4.2] For p a cc,q n , we only need to consider j3 = 1. Using the 
transformation t = t{9) = a n , £ {O-0o), since a n ^a n = 1, q n (6) = an, £ |E| 1 / 2 g(S _1 / 2 (t-c M )). 
Then similar to the approximation of vr(l), 


Pacc,q n 


= 4 J Qn(0) fABC (Sobs \B) dd 

= £ t<,e l s l 1/2 [ - c p ))f A Bc(s obs \e 0 + a~*t) dt( 1 + Op(l)). 


Plugging in the expansions of fABc(s ob s\&o + CL n \t) in Lemma 6, we can obtain the expan¬ 
sion similar to (11), and it can be seen that p a cc,q n = ©p(On,£ e n) if fm. d xt(B s ) ^(S -1 / 2 ^ ~ 
Cf_i))g n {t,v) dvdt = 0 P ( 1) and f R d xt(Bs) <?(LU 1/2 (i - c p ))g n (t, v, n, r 2 ) dvdt = O v { 1), where 

< 5I P and t 2 T 2 < did- Noting that g(E _1 / 2 (£ — c p )) is upper bounded for t € M p , by 
Lemma ?? and Lemma 4, these two integrals are upper bounded. By the positivity of the 
limit of the integrand and Fatou’s lemma, the first integral is lower bounded. Therefore 
Pacc,q n = @ P (ai )£ E d n ) holds. 

For Tiis,n, by its definition and (18), we have 


P'ISjn — Pace, qn^- 1 ABC, n — 


Pacc,q„ 7T Bs,Is(,G n ) 

Pace,-K ^Bgi 1) 


(1 + O p (l)), 


where the second equality holds since vr B c 5 ,is(G n ) is ignorable by noting that u n (6) < 
/3 _1 and using the arguments of Lemma 3. Given the orders of p a cc,q n and p aC c,iT which 
are already obtained, in order for S/s !n = O p (o“ 2 ), it only needs tt Bs ,is(G n )/n Bs ( 1) = 
O p (an~ e 2 ). Similar to (15), we have the following expansion 


7T B s ,Is(G n ) 

W 1 ) 


G{6q) BdJS ^^ + 2a e(7i(0 o ) - h A Bc) 
TB^l) 


KB s ,is(Dh(0 t ) T t) _ 2 ir Bs js(t T Dh(9t)Dh(9 t ) T t ) 


W 1 ) 


T" CL 


W 1 ) 


and we only need n Bs) is(P 2 (t))/ir Bs (l) = O p (cLn,s)■ Since w n (9) < (1 - /3) 1 w n ,i(0), where 
w n< i(0) is the weight when (3 = 1, it is sufficient to consider the case /? = 1. By (7), n Bs js 
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can be replaced by tt Bs ,is in which fABc(s obs \0) is replaced by fABc{s obs \0). Using the 
transformation t(6) and plugging in the expansions of fABc($obs |$o + a n^ £ t), we have the 
following expansion similar to (11), 


a n d £ 7 r B g ,Is{P 2 (t )) - 7 r(0 o y 


[ p fu<9n(t>v)dv 

L) m) iV-'/Ht - o„)) dt 


Jt(Bs) \A(Oo + e3(t))| 1/2 g(E 

+ ( a nl{a n , £ =a n } + f P W) 

Then we need to show that 


dt 


Pi(t,v)g n (t,v; r 1-3 (a,At), r 2 ,t(a At)) dv 


<l(E-i/2 (t-C,,)) 


dt. 


(19) 

f D f*\ I-Rd9n(t,v)dv 

/ p 3 /v-17277 -U dt and 

Jt(Bs) 9(S I ' 2 (t-C„)) 

By (CIO), for the first integral in the above, we have 

Jjjd 9n(ti v ) dv 


f DU J^ P ^ t ^ v )9n{t,v,r h3 (a n lt),r 2 A(a n ^t))dv 

/ p 2(i)- rs'-i/zr* -U- dt are °pA)- 

n(B s ) q(£ 1/2 (t-Cf,)) 


< M r , 


g n (t,v)dv 


1 —a 


g(E-V2(i_ C/i) ) 

where M n is a scalar and has the order O p { 1); for the second integral, 

f R d p 4 (t, v)g n (t, v; r 1>3 (a- 1 £ t),r 2 , 4 (a- 1 £ t)) dv 
q(Z-i/2(t-c,)) 


<M n 


g n (t,v;r l}3 (a n ^t),r 2A {a n ^ £ t)) (f u , -Uu/ 1 “ 

p 4 (t,v )~— ^ — 7~±T^ — , dv [ / gn(t,v,r 1)3 (a n}£ t),r 2A (a n J))dv 


f Rd g n (t, v; r lj3 (a n , e t), r 2A (a n , e t)) dv 
= P 4 (t) g n (t,v ; ri )3 (o“Jt), r 2 , 4 (o“et)) dv^j 


Then by the inequality (a + b) 1 “ < a 1 “ + b 1 a , for a e (0,1) and a, b > 0, and following 
the arguments of (17), (19) holds. Therefore T,js, n = O p {a~ 2 e ). □ 


Proof. [Proof of Proposition 4.2] 

With the notations of Lemma 4, it is sufficient to show that 


sup 

,t t t<5I p 



A -\- t 5 Bv + c, 




T(E-i/2 (t _ M);7) 


O p (l). 


This can be seen by the inequality (7) in the supplement material, the inequality (a + b) a < 
a a + b a and the assumption on K(v) a . 

□ 
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